Agents Teaching Agents in Reinforcement Learning
نویسندگان
چکیده
Using reinforcement learning [4] (RL), agents can autonomously learn a control policy to master sequential-decision tasks. Rather than always learning tabula rasa, our recent work [5, 7, 8] considers how an experienced RL agent, the teacher, can help another RL agent, the student, to learn. As a motivating example, consider a household robot that has learned to perform tasks in a household. When the consumer purchases a new robot, she would like the student robot to quickly learn to perform the same tasks as the teacher robot, even if the new robot has different state representation, learning method, or manufacturer. Our goals are to: 1) Allow the student to learn faster with the teacher than without it, 2) Allow the student and teacher to have different learning methods and knowledge representations, 3) Not limit the student’s performance when the teacher is sub-optimal, 4) Not require a complex, shared language, and 5) Limit the amount of communication required between the agents. Our approach was influenced by learning from demonstration [1] (LfD) and transfer learning [6] (TL). LfD methods typically do not achieve goals 3 and 5, limiting an agents’ performance to that of the demonstrator, and requiring many trajectory demonstrations. The majority of TL methods assume that the trained agent knows the new agent’s learning method or knowledge representation, failing to meet goal 2, and assumes direct access to the the “brain” of the student agent, failing goals 4 or 5. We investigate how an RL agent can best teach another RL agent using a limited amount of advice, assuming that the teacher can observe the student’s state and that the student can receive (and execute) action advice from the teacher. The teacher can give advice a fixed number of times, but cannot observe or change anything internal to the student. This paper presents three of our teaching algorithms and shows a selection of results in the Ms. Pac-Man domain, although our work has also evaluated our methods in the Mountain Car and StarCraft domains. A key insight is that the same amount of advice, given at different moments, can have different effects on student learning. Results show our teaching methods can achieve all five of the above goals.
منابع مشابه
Hierarchical Functional Concepts for Knowledge Transfer among Reinforcement Learning Agents
This article introduces the notions of functional space and concept as a way of knowledge representation and abstraction for Reinforcement Learning agents. These definitions are used as a tool of knowledge transfer among agents. The agents are assumed to be heterogeneous; they have different state spaces but share a same dynamic, reward and action space. In other words, the agents are assumed t...
متن کاملOnline Transfer Learning in Reinforcement Learning Domains
This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Qlearning and Sarsa with ...
متن کاملAgents Teaching Humans in Reinforcement Learning Tasks
This paper extends our existing teacher-student framework to allow a knowledgeable agent to teach human students. An agent teacher instructs a human student by suggesting actions the student should take as it learns. This paper extends previous algorithms, used for agents teaching other agents, to develop several new algorithms for agents teaching humans. Our results in the Pac-Man domain show ...
متن کاملA Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem
Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...
متن کاملOutsourcing or Insourcing of Transportation System Evaluation Using Intelligent Agents Approach
Nowadays, outsourcing is viewed as a trade strategy and organizations tend to adopt new strategies to achieve competitive advantages in the current world of business. focusing on main copmpetencies, and transferring most of activities to outside resources of organization( outsourcing) is one such strategy is. In this paper, we aim to decide on decision maker agent of transportation system, by a...
متن کاملAgent-Agnostic Human-in-the-Loop Reinforcement Learning
Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. To this end, prior work has developed teaching protocols that enable agents to learn efficiently in complex environments. In many of these methods, the teacher’s guidance is tailored to agents with a particular representation or underlying learning scheme, offering effective but high...
متن کامل